291 research outputs found

    An integrated approach to the interpretation of Single Amino Acid Polymorphisms within the framework of CATH and Gene3D

    Get PDF
    Background: The phenotypic effects of sequence variations in protein-coding regions come about primarily via their effects on the resulting structures, for example by disrupting active sites or affecting structural stability. In order better to understand the mechanisms behind known mutant phenotypes, and predict the effects of novel variations, biologists need tools to gauge the impacts of DNA mutations in terms of their structural manifestation. Although many mutations occur within domains whose structure has been solved, many more occur within genes whose protein products have not been structurally characterized.Results: Here we present 3DSim (3D Structural Implication of Mutations), a database and web application facilitating the localization and visualization of single amino acid polymorphisms (SAAPs) mapped to protein structures even where the structure of the protein of interest is unknown. The server displays information on 6514 point mutations, 4865 of them known to be associated with disease. These polymorphisms are drawn from SAAPdb, which aggregates data from various sources including dbSNP and several pathogenic mutation databases. While the SAAPdb interface displays mutations on known structures, 3DSim projects mutations onto known sequence domains in Gene3D. This resource contains sequences annotated with domains predicted to belong to structural families in the CATH database. Mappings between domain sequences in Gene3D and known structures in CATH are obtained using a MUSCLE alignment. 1210 three-dimensional structures corresponding to CATH structural domains are currently included in 3DSim; these domains are distributed across 396 CATH superfamilies, and provide a comprehensive overview of the distribution of mutations in structural space.Conclusion: The server is publicly available at http://3DSim.bioinfo.cnio.es/. In addition, the database containing the mapping between SAAPdb, Gene3D and CATH is available on request and most of the functionality is available through programmatic web service access

    Finding the "Dark Matter'' in Human and Yeast Protein Network Prediction and Modelling

    Get PDF
    Accurate modelling of biological systems requires a deeper and more complete knowledge about the molecular components and their functional associations than we currently have. Traditionally, new knowledge on protein associations generated by experiments has played a central role in systems modelling, in contrast to generally less trusted bio-computational predictions. However, we will not achieve realistic modelling of complex molecular systems if the current experimental designs lead to biased screenings of real protein networks and leave large, functionally important areas poorly characterised. To assess the likelihood of this, we have built comprehensive network models of the yeast and human proteomes by using a meta-statistical integration of diverse computationally predicted protein association datasets. We have compared these predicted networks against combined experimental datasets from seven biological resources at different level of statistical significance. These eukaryotic predicted networks resemble all the topological and noise features of the experimentally inferred networks in both species, and we also show that this observation is not due to random behaviour. In addition, the topology of the predicted networks contains information on true protein associations, beyond the constitutive first order binary predictions. We also observe that most of the reliable predicted protein associations are experimentally uncharacterised in our models, constituting the hidden or "dark matter'' of networks by analogy to astronomical systems. Some of this dark matter shows enrichment of particular functions and contains key functional elements of protein networks, such as hubs associated with important functional areas like the regulation of Ras protein signal transduction in human cells. Thus, characterising this large and functionally important dark matter, elusive to established experimental designs, may be crucial for modelling biological systems. In any case, these predictions provide a valuable guide to these experimentally elusive regions

    A realistic assessment of methods for extracting gene/protein interactions from free text

    Get PDF
    Background: The automated extraction of gene and/or protein interactions from the literature is one of the most important targets of biomedical text mining research. In this paper we present a realistic evaluation of gene/protein interaction mining relevant to potential non-specialist users. Hence we have specifically avoided methods that are complex to install or require reimplementation, and we coupled our chosen extraction methods with a state-of-the-art biomedical named entity tagger. Results: Our results show: that performance across different evaluation corpora is extremely variable; that the use of tagged (as opposed to gold standard) gene and protein names has a significant impact on performance, with a drop in F-score of over 20 percentage points being commonplace; and that a simple keyword-based benchmark algorithm when coupled with a named entity tagger outperforms two of the tools most widely used to extract gene/protein interactions. Conclusion: In terms of availability, ease of use and performance, the potential non-specialist user community interested in automatically extracting gene and/or protein interactions from free text is poorly served by current tools and systems. The public release of extraction tools that are easy to install and use, and that achieve state-of-art levels of performance should be treated as a high priority by the biomedical text mining community

    A comparison of common programming languages used in bioinformatics

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The performance of different programming languages has previously been benchmarked using abstract mathematical algorithms, but not using standard bioinformatics algorithms. We compared the memory usage and speed of execution for three standard bioinformatics methods, implemented in programs using one of six different programming languages. Programs for the Sellers algorithm, the Neighbor-Joining tree construction algorithm and an algorithm for parsing BLAST file outputs were implemented in C, C++, C#, Java, Perl and Python.</p> <p>Results</p> <p>Implementations in C and C++ were fastest and used the least memory. Programs in these languages generally contained more lines of code. Java and C# appeared to be a compromise between the flexibility of Perl and Python and the fast performance of C and C++. The relative performance of the tested languages did not change from Windows to Linux and no clear evidence of a faster operating system was found.</p> <p>Source code and additional information are available from <url>http://www.bioinformatics.org/benchmark/</url></p> <p>Conclusion</p> <p>This benchmark provides a comparison of six commonly used programming languages under two different operating systems. The overall comparison shows that a developer should choose an appropriate language carefully, taking into account the performance expected and the library availability for each language.</p

    Large Scale Application of Neural Network Based Semantic Role Labeling for Automated Relation Extraction from Biomedical Texts

    Get PDF
    To reduce the increasing amount of time spent on literature search in the life sciences, several methods for automated knowledge extraction have been developed. Co-occurrence based approaches can deal with large text corpora like MEDLINE in an acceptable time but are not able to extract any specific type of semantic relation. Semantic relation extraction methods based on syntax trees, on the other hand, are computationally expensive and the interpretation of the generated trees is difficult. Several natural language processing (NLP) approaches for the biomedical domain exist focusing specifically on the detection of a limited set of relation types. For systems biology, generic approaches for the detection of a multitude of relation types which in addition are able to process large text corpora are needed but the number of systems meeting both requirements is very limited. We introduce the use of SENNA (“Semantic Extraction using a Neural Network Architecture”), a fast and accurate neural network based Semantic Role Labeling (SRL) program, for the large scale extraction of semantic relations from the biomedical literature. A comparison of processing times of SENNA and other SRL systems or syntactical parsers used in the biomedical domain revealed that SENNA is the fastest Proposition Bank (PropBank) conforming SRL program currently available. 89 million biomedical sentences were tagged with SENNA on a 100 node cluster within three days. The accuracy of the presented relation extraction approach was evaluated on two test sets of annotated sentences resulting in precision/recall values of 0.71/0.43. We show that the accuracy as well as processing speed of the proposed semantic relation extraction approach is sufficient for its large scale application on biomedical text. The proposed approach is highly generalizable regarding the supported relation types and appears to be especially suited for general-purpose, broad-scale text mining systems. The presented approach bridges the gap between fast, cooccurrence-based approaches lacking semantic relations and highly specialized and computationally demanding NLP approaches

    Benchmarking natural-language parsers for biological applications using dependency graphs

    Get PDF
    BACKGROUND: Interest is growing in the application of syntactic parsers to natural language processing problems in biology, but assessing their performance is difficult because differences in linguistic convention can falsely appear to be errors. We present a method for evaluating their accuracy using an intermediate representation based on dependency graphs, in which the semantic relationships important in most information extraction tasks are closer to the surface. We also demonstrate how this method can be easily tailored to various application-driven criteria. RESULTS: Using the GENIA corpus as a gold standard, we tested four open-source parsers which have been used in bioinformatics projects. We first present overall performance measures, and test the two leading tools, the Charniak-Lease and Bikel parsers, on subtasks tailored to reflect the requirements of a system for extracting gene expression relationships. These two tools clearly outperform the other parsers in the evaluation, and achieve accuracy levels comparable to or exceeding native dependency parsers on similar tasks in previous biological evaluations. CONCLUSION: Evaluating using dependency graphs allows parsers to be tested easily on criteria chosen according to the semantics of particular biological applications, drawing attention to important mistakes and soaking up many insignificant differences that would otherwise be reported as errors. Generating high-accuracy dependency graphs from the output of phrase-structure parsers also provides access to the more detailed syntax trees that are used in several natural-language processing techniques

    Prevalence and correlates of frailty in an older rural African population:findings from the HAALSI cohort study

    Get PDF
    Background: Frailty is a key predictor of death and dependency, yet little is known about frailty in sub-Saharan Africa despite rapid population ageing. We describe the prevalence and correlates of phenotypic frailty using data from the Health and Aging in Africa: Longitudinal Studies of an INDEPTH Community cohort. Methods: We analysed data from rural South Africans aged 40 and over. We used low grip strength, slow gait speed, low body mass index, and combinations of self-reported exhaustion, decline in health, low physical activity and high self-reported sedentariness to derive nine variants of a phenotypic frailty score. Each frailty category was compared with self-reported health, subjective wellbeing, impairment in activities of daily living and the presence of multimorbidity. Cox regression analyses were used to compare subsequent all-cause mortality for non-frail (score 0), pre-frail (score 1–2) and frail participants (score 3+). Results: Five thousand fifty nine individuals (mean age 61.7 years, 2714 female) were included in the analyses. The nine frailty score variants yielded a range of frailty prevalences (5.4% to 13.2%). For all variants, rates were higher in women than in men, and rose steeply with age. Frailty was associated with worse subjective wellbeing, and worse self-reported health. Both prefrailty and frailty were associated with a higher risk of death during a mean 17 month follow up for all score variants (hazard ratios 1.29 to 2.41 for pre-frail vs non-frail; hazard ratios 2.65 to 8.91 for frail vs non-frail). Conclusions: Phenotypic frailty could be measured in this older South African population, and was associated with worse health, wellbeing and earlier death

    Non-neutralizing antibodies elicited by recombinant Lassa-Rabies vaccine are critical for protection against Lassa fever

    Get PDF
    Lassa fever (LF), caused by Lassa virus (LASV), is a viral hemorrhagic fever for which no approved vaccine or potent antiviral treatment is available. LF is a WHO priority disease and, together with rabies, a major health burden in West Africa. Here we present the development and characterization of an inactivated recombinant LASV and rabies vaccine candidate (LASSARAB) that expresses a codon-optimized LASV glycoprotein (coGPC) and is adjuvanted by a TLR-4 agonist (GLA-SE). LASSARAB elicits lasting humoral response against LASV and RABV in both mouse and guinea pig models, and it protects both guinea pigs and mice against LF. We also demonstrate a previously unexplored role for non-neutralizing LASV GPC-specific antibodies as a major mechanism of protection by LASSARAB against LF through antibody-dependent cellular functions. Overall, these findings demonstrate an effective inactivated LF vaccine and elucidate a novel humoral correlate of protection for LF.NIH grants R01 AI105204 to M.J.S., by the Jefferson Vaccine Center, and by the Fundação para a Ciência e Tecnologia (FCT) scholarship PD/BD/105847/2014 (to T.A.-M.). This work was also funded in part through the NIAID Division of Intramural Research and the NIAID Division of Clinical Research, Battelle Memorial Institute’s prime contract with the U.S. National Institute of Allergy and Infectious Diseases (NIAID) under Contract No. HHSN272200700016Iinfo:eu-repo/semantics/publishedVersio

    Corporate philanthropy through the lens of ethical subjectivity

    Get PDF
    The dynamic organisational processes in businesses dilute the boundaries between the individual, organisational, and societal drivers of corporate philanthropy. This creates a complex framework in which charitable project selection occurs. Using the example of European tour operators, this study investigates the mechanisms through which companies invest in charitable projects in overseas destinations. Inextricably linked to this is the increasing contestation by local communities as to how they are able to engage effectively with tourism in order to realise the benefits tourism development can bring. This research furthers such debates by exploring the processes through which tour operators facilitate community development through charitable giving. Findings show, with no formal frameworks in existence, project selection depends upon emergent strategies that connect the professional with the personal, with trust being positioned as a central driver of these informal processes. Discretionary responsibilities are reworked through business leaders’ commitment to responsible business practises and the ethical subjectivity guiding these processes
    corecore